Background

This file is designed to analyze coronavirus data from a single state using three data sources:

Code for processing data from each of these sources is available in:

The goal of this file is to download updated data for the three main data sources, and to explore the performance of the segments as measured against the new data.

Functions are sourced and a variable mapping file is created:

# All functions assume that tidyverse and its components are loaded and available
# Other functions are declared in the sourcing files or use library::function()
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# If the same function is in both files, use the version from the more specific source
source("./Coronavirus_Statistics_Functions_Shared_v003.R")
source("./Coronavirus_Statistics_Functions_CTP_v003.R")
source("./Coronavirus_Statistics_Functions_USAF_v003.R")
source("./Coronavirus_Statistics_Functions_CDC_v003.R")

# Create a variable mapping file
varMapper <- c("cases"="Cases", 
               "newCases"="Increase in cases, most recent 30 days",
               "casesroll7"="Rolling 7-day mean cases", 
               "deaths"="Deaths", 
               "newDeaths"="Increase in deaths, most recent 30 days",
               "deathsroll7"="Rolling 7-day mean deaths", 
               "cpm"="Cases per million",
               "cpm7"="Cases per day (7-day rolling mean) per million", 
               "newcpm"="Increase in cases, most recent 30 days, per million",
               "dpm"="Deaths per million", 
               "dpm7"="Deaths per day (7-day rolling mean) per million", 
               "newdpm"="Increase in deaths, most recent 30 days, per million", 
               "hpm7"="Currently Hospitalized per million (7-day rolling mean)", 
               "tpm"="Tests per million", 
               "tpm7"="Tests per million per day (7-day rolling mean)", 
               "cdcExcess"="Excess all-cause (CDC)", 
               "ctp_death7"="COVID Tracking Project", 
               "usaf_death7"="USA Facts",
               "CDC_deaths"="CDC total deaths",
               "CDC_excess"="CDC excess deaths",
               "CTP_cases"="COVID Tracking Project cases",
               "CTP_deaths"="COVID Tracking Project deaths",
               "CTP_hosp"="COVID Tracking Project hospitalized",
               "CTP_tests"="COVID Tracking Project tests",
               "USAF_cases"="USA Facts cases", 
               "USAF_deaths"="USA Facts deaths",
               "vpm7"="Per million people (7-day rolling daily average)",
               "vpm"="Per million people"
               )

Data Updates (early November)

New data from COVID Tracking Project are downloaded and assessed against the existing state-level segments:

# Use existing segments with updated data
locDownload <- "./RInputFiles/Coronavirus/CV_downloaded_201108.csv"
test_old_201108 <- readRunCOVIDTrackingProject(thruLabel="Nov 7, 2020", 
                                               downloadTo=if (file.exists(locDownload)) NULL else locDownload,
                                               readFrom=locDownload, 
                                               compareFile=readFromRDS("test_hier5_201025")$dfRaw,
                                               useClusters=readFromRDS("test_hier5_201025")$useClusters
                                               )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   state = col_character(),
##   totalTestResultsSource = col_character(),
##   dataQualityGrade = col_character(),
##   lastUpdateEt = col_character(),
##   dateModified = col_datetime(format = ""),
##   checkTimeEt = col_character(),
##   dateChecked = col_datetime(format = ""),
##   fips = col_character(),
##   hash = col_character(),
##   grade = col_logical()
## )
## i Use `spec()` for the full column specifications.
## Warning: 7520 parsing failures.
## row          col   expected               actual                                                 file
##   3 dateModified valid date 2020-11-07T24:00:00Z './RInputFiles/Coronavirus/CV_downloaded_201108.csv'
##   3 dateChecked  valid date 2020-11-07T24:00:00Z './RInputFiles/Coronavirus/CV_downloaded_201108.csv'
##   4 dateModified valid date 2020-11-01T24:00:00Z './RInputFiles/Coronavirus/CV_downloaded_201108.csv'
##   4 dateChecked  valid date 2020-11-01T24:00:00Z './RInputFiles/Coronavirus/CV_downloaded_201108.csv'
##   5 dateModified valid date 2020-11-07T24:00:00Z './RInputFiles/Coronavirus/CV_downloaded_201108.csv'
## ... ............ .......... .................... ....................................................
## See problems(...) for more details.
## 
## File is unique by state and date
## 
## 
## Overall control totals in file:
## # A tibble: 1 x 3
##   positiveIncrease deathIncrease hospitalizedCurrently
##              <dbl>         <dbl>                 <dbl>
## 1          9761373        229238               9362912
## 
## *** COMPARISONS TO REFERENCE FILE: compareFile
## 
## Checkin for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checkin for similarity of: states
## In reference but not in current: 
## In current but not in reference: 
## 
## Checkin for similarity of: dates
## In reference but not in current: 
## In current but not in reference: 2020-11-07 2020-11-06 2020-11-05 2020-11-04 2020-11-03 2020-11-02 2020-11-01 2020-10-31 2020-10-30 2020-10-29 2020-10-28 2020-10-27 2020-10-26 2020-10-25
## 
## *** Difference of at least 5 and difference is at least 1%:
## Joining, by = c("date", "name")
##          date             name newValue oldValue
## 1  2020-03-07 positiveIncrease      171      176
## 2  2020-03-11 positiveIncrease      502      509
## 3  2020-03-13 positiveIncrease     1059     1072
## 4  2020-03-18 positiveIncrease     3023     3089
## 5  2020-03-26 positiveIncrease    17544    17720
## 6  2020-03-28 positiveIncrease    19586    19925
## 7  2020-03-29 positiveIncrease    19570    19348
## 8  2020-03-30 positiveIncrease    21691    22042
## 9  2020-04-01 positiveIncrease    26078    25791
## 10 2020-04-06 positiveIncrease    28592    29002
## 11 2020-04-13 positiveIncrease    24758    25195
## 12 2020-04-15 positiveIncrease    29755    30307
## 13 2020-04-16 positiveIncrease    31489    30978
## 14 2020-04-24 positiveIncrease    33698    34231
## 15 2020-05-12 positiveIncrease    22520    22890
## 16 2020-05-13 positiveIncrease    21577    21285
## 17 2020-05-15 positiveIncrease    25371    24685
## 18 2020-05-16 positiveIncrease    23560    24702
## 19 2020-05-17 positiveIncrease    20344    20009
## 20 2020-05-18 positiveIncrease    20812    21028
## 21 2020-05-23 positiveIncrease    22167    21531
## 22 2020-05-24 positiveIncrease    19148    20072
## 23 2020-05-30 positiveIncrease    23443    23682
## 24 2020-06-04 positiveIncrease    20256    20886
## 25 2020-06-05 positiveIncrease    23004    23394
## 26 2020-06-06 positiveIncrease    22773    23064
## 27 2020-06-10 positiveIncrease    20637    20894
## 28 2020-06-12 positiveIncrease    23185    23597
## 29 2020-06-18 positiveIncrease    27135    27746
## 30 2020-06-19 positiveIncrease    30881    31471
## 31 2020-06-21 positiveIncrease    28991    27928
## 32 2020-06-23 positiveIncrease    33848    33447
## 33 2020-07-02 positiveIncrease    53385    54085
## 34 2020-07-06 positiveIncrease    41416    41959
## 35 2020-08-01 positiveIncrease    60416    61101
## 36 2020-08-08 positiveIncrease    53158    53712
## 37 2020-08-14 positiveIncrease    57254    55636
## 38 2020-08-22 positiveIncrease    45722    46236
## 39 2020-09-02 positiveIncrease    30287    30603
## 40 2020-09-07 positiveIncrease    28237    28682
## 41 2020-09-15 positiveIncrease    34879    35445
## 42 2020-09-19 positiveIncrease    44886    45564
## 43 2020-09-20 positiveIncrease    35688    36295
## 44 2020-09-21 positiveIncrease    39062    39472
## 45 2020-09-24 positiveIncrease    43243    43772
## 46 2020-09-27 positiveIncrease    35061    35454
## 47 2020-09-28 positiveIncrease    36056    36524
## 48 2020-09-29 positiveIncrease    36289    36947
## 49 2020-10-21 positiveIncrease    60657    58606
## 50 2020-10-22 positiveIncrease    72887    75248
## Joining, by = c("date", "name")
## Warning: Removed 14 row(s) containing missing values (geom_path).
## 
## 
## *** Difference of at least 5 and difference is at least 1%:
## Joining, by = c("state", "name")
##   state             name newValue oldValue
## 1    FL positiveIncrease   766305   776249
## 2    PR positiveIncrease    31067    61275
## 3    RI positiveIncrease    30581    30116
## Rows: 13,999
## Columns: 55
## $ date                        <date> 2020-11-07, 2020-11-07, 2020-11-07, 20...
## $ state                       <chr> "AK", "AL", "AR", "AS", "AZ", "CA", "CO...
## $ positive                    <dbl> 19306, 202482, 120828, 0, 257384, 95695...
## $ probableCases               <dbl> NA, 30709, 10812, NA, 6590, NA, 7501, 4...
## $ negative                    <dbl> 728589, 1224595, 1308477, 1768, 1608041...
## $ pending                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ totalTestResultsSource      <chr> "totalTestsViral", "totalTestsViral", "...
## $ totalTestResults            <dbl> 747288, 1396368, 1418493, 1768, 1858835...
## $ hospitalizedCurrently       <dbl> 105, 1015, 633, NA, 1139, 3456, 1041, 4...
## $ hospitalizedCumulative      <dbl> NA, 21294, 7415, NA, 22170, NA, 9911, 1...
## $ inIcuCurrently              <dbl> NA, NA, 235, NA, 249, 901, NA, NA, 25, ...
## $ inIcuCumulative             <dbl> NA, 2121, NA, NA, NA, NA, NA, NA, NA, N...
## $ onVentilatorCurrently       <dbl> 8, NA, 105, NA, 137, NA, NA, NA, 14, NA...
## $ onVentilatorCumulative      <dbl> NA, 1225, 865, NA, NA, NA, NA, NA, NA, ...
## $ recovered                   <dbl> 7157, 84471, 106594, NA, 42950, NA, 847...
## $ dataQualityGrade            <chr> "A", "A", "A+", "D", "A+", "B", "A", "B...
## $ lastUpdateEt                <chr> "11/7/2020 03:59", "11/7/2020 11:00", "...
## $ dateModified                <dttm> 2020-11-07 03:59:00, 2020-11-07 11:00:...
## $ checkTimeEt                 <chr> "11/06 22:59", "11/07 06:00", "11/06 19...
## $ death                       <dbl> 84, 3082, 2068, 0, 6147, 17939, 2168, 4...
## $ hospitalized                <dbl> NA, 21294, 7415, NA, 22170, NA, 9911, 1...
## $ dateChecked                 <dttm> 2020-11-07 03:59:00, 2020-11-07 11:00:...
## $ totalTestsViral             <dbl> 747288, 1396368, 1418493, 1768, NA, 195...
## $ positiveTestsViral          <dbl> 21064, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ negativeTestsViral          <dbl> 725850, NA, 1308477, NA, NA, NA, NA, NA...
## $ positiveCasesViral          <dbl> 19306, 171773, 110016, 0, 250794, 95695...
## $ deathConfirmed              <dbl> 84, 2864, 1890, NA, 5730, NA, NA, 3757,...
## $ deathProbable               <dbl> NA, 218, 178, NA, 417, NA, NA, 914, NA,...
## $ totalTestEncountersViral    <dbl> NA, NA, NA, NA, NA, NA, 2177091, NA, 55...
## $ totalTestsPeopleViral       <dbl> NA, NA, NA, NA, 1858835, NA, 1303345, N...
## $ totalTestsAntibody          <dbl> NA, NA, NA, NA, 324293, NA, 187631, NA,...
## $ positiveTestsAntibody       <dbl> NA, NA, NA, NA, NA, NA, 14007, NA, NA, ...
## $ negativeTestsAntibody       <dbl> NA, NA, NA, NA, NA, NA, 173624, NA, NA,...
## $ totalTestsPeopleAntibody    <dbl> NA, 66104, NA, NA, NA, NA, NA, NA, NA, ...
## $ positiveTestsPeopleAntibody <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ negativeTestsPeopleAntibody <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ totalTestsPeopleAntigen     <dbl> NA, NA, 76277, NA, NA, NA, NA, NA, NA, ...
## $ positiveTestsPeopleAntigen  <dbl> NA, NA, 12392, NA, NA, NA, NA, NA, NA, ...
## $ totalTestsAntigen           <dbl> NA, NA, 21856, NA, NA, NA, NA, 26512, N...
## $ positiveTestsAntigen        <dbl> NA, NA, 3300, NA, NA, NA, NA, NA, NA, N...
## $ fips                        <chr> "02", "01", "05", "60", "04", "06", "08...
## $ positiveIncrease            <dbl> 607, 1768, 1598, 0, 2620, 5863, 3463, 0...
## $ negativeIncrease            <dbl> -34013, 7593, 10213, 0, 14060, 162939, ...
## $ total                       <dbl> 747895, 1427077, 1429305, 1768, 1865425...
## $ totalTestResultsIncrease    <dbl> -34013, 8920, 11491, 0, 16524, 168802, ...
## $ posNeg                      <dbl> 747895, 1427077, 1429305, 1768, 1865425...
## $ deathIncrease               <dbl> 0, 33, 12, 0, 38, 73, 10, 0, 2, 0, 87, ...
## $ hospitalizedIncrease        <dbl> 0, 0, 14, 0, 152, 0, 197, 0, 0, 0, 161,...
## $ hash                        <chr> "f2176e93601204643e0618a661e7c3603f44f4...
## $ commercialScore             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ negativeRegularScore        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ negativeScore               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ positiveScore               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ score                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ grade                       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## 
## 
## Control totals - note that validState other than TRUE will be discarded
## 
## # A tibble: 2 x 6
##   validState   cases deaths  hosp     tests     n
##   <lgl>        <dbl>  <dbl> <dbl>     <dbl> <dbl>
## 1 FALSE        44159    976    NA    457982  1185
## 2 TRUE       9717214 228262    NA 154892254 12814
## Rows: 12,814
## Columns: 6
## $ date   <date> 2020-11-07, 2020-11-07, 2020-11-07, 2020-11-07, 2020-11-07,...
## $ state  <chr> "AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE", "FL", ...
## $ cases  <dbl> 607, 1768, 1598, 2620, 5863, 3463, 0, 99, 223, 4380, 1719, 1...
## $ deaths <dbl> 0, 33, 12, 38, 73, 10, 0, 2, 0, 87, 39, 0, 11, 8, 91, 45, 0,...
## $ hosp   <dbl> 105, 1015, 633, 1139, 3456, 1041, 402, 77, 115, 2672, 1859, ...
## $ tests  <dbl> -34013, 8920, 11491, 16524, 168802, 35158, 0, 4262, 5159, 48...
## Rows: 12,814
## Columns: 14
## $ date   <date> 2020-01-22, 2020-01-22, 2020-01-23, 2020-01-23, 2020-01-24,...
## $ state  <chr> "MA", "WA", "MA", "WA", "MA", "WA", "MA", "WA", "MA", "WA", ...
## $ cases  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ hosp   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ tests  <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, ...
## $ cpm    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ dpm    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ hpm    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ tpm    <dbl> 0.0000000, 0.0000000, 0.1471796, 0.0000000, 0.0000000, 0.000...
## $ cpm7   <dbl> NA, NA, NA, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, NA...
## $ dpm7   <dbl> NA, NA, NA, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, NA...
## $ hpm7   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ tpm7   <dbl> NA, NA, NA, NA, NA, NA, 0.04205130, 0.00000000, 0.06307695, ...
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` regrouping output by 'date', 'cluster' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## 
## Recency is defined as 2020-10-09 through current
## 
## Recency is defined as 2020-10-09 through current

## `summarise()` regrouping output by 'state', 'cluster', 'date' (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

saveToRDS(test_old_201108, ovrWriteError=FALSE)

Cases appear to be spiking in some of the states in the segment that had previously been less impacted. Hospitalizations in aggregate in this segment are starting to slope upwards, though not yet at the same rate as the increase in cases.

New data from USA Facts are downloaded and assessed against the existing county-level segments:

# Locations for the population, case, and death file
popLoc <- "./RInputFiles/Coronavirus/covid_county_population_usafacts.csv"
caseLoc <- "./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20201109.csv"
deathLoc <- "./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20201109.csv"

# Run old segments against new data
cty_old_20201109 <- readRunUSAFacts(maxDate="2020-11-07", 
                                    popLoc=popLoc, 
                                    caseLoc=caseLoc, 
                                    deathLoc=deathLoc, 
                                    dlCaseDeath=!(file.exists(caseLoc) & file.exists(deathLoc)),
                                    oldFile=readFromRDS("cty_20201026")$dfBurden, 
                                    existingCountyClusters=readFromRDS("cty_20201026")$clustVec
                                    )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   countyFIPS = col_double(),
##   `County Name` = col_character(),
##   State = col_character(),
##   population = col_double()
## )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   `County Name` = col_character(),
##   State = col_character()
## )
## i Use `spec()` for the full column specifications.
## Rows: 929,745
## Columns: 6
## $ countyFIPS <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ countyName <chr> "Statewide Unallocated", "Statewide Unallocated", "State...
## $ state      <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ stateFIPS  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ date       <date> 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020-01...
## $ cumCases   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## Warning: `expand_scale()` is deprecated; use `expansion()` instead.
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   `County Name` = col_character(),
##   State = col_character()
## )
## i Use `spec()` for the full column specifications.
## Warning: 1 parsing failure.
##  row     col               expected actual                                                                      file
## 1366 11/7/20 no trailing characters  1,020 './RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20201109.csv'
## Rows: 929,745
## Columns: 6
## $ countyFIPS <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ countyName <chr> "Statewide Unallocated", "Statewide Unallocated", "State...
## $ state      <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ stateFIPS  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ date       <date> 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020-01...
## $ cumDeaths  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## Warning: `expand_scale()` is deprecated; use `expansion()` instead.
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
## `summarise()` ungrouping output (override with `.groups` argument)

## Warning: Removed 2 rows containing missing values (geom_point).
## `summarise()` ungrouping output (override with `.groups` argument)
## 
## Shapes will be created without any floor on the number of cases per million
## Shapes will be created without any floor on the number of deaths per million
## *** Counties with 0 cases/deaths or that fall below the floor for minCase/minDeath ***
## # A tibble: 1 x 4
##   cpm_mean_is0 dpm_mean_is0 dpm_mean_ltDeath cpm_mean_ltCase
##          <dbl>        <dbl>            <dbl>           <dbl>
## 1            0           NA               NA               0
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` regrouping output by 'date', 'cluster' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## Warning: Removed 1 rows containing missing values (position_stack).
## Warning: Removed 1 rows containing missing values (geom_text).

## 
## Recency is defined as 2020-10-09 through current
## 
## Recency is defined as 2020-10-09 through current
## Warning: Removed 2 rows containing missing values (geom_point).
## `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
## `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
## `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
## `summarise()` regrouping output by 'cluster' (override with `.groups` argument)

## Warning: `expand_scale()` is deprecated; use `expansion()` instead.

## Joining, by = "fipsCounty"
## Joining, by = "fipsCounty"
## `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

saveToRDS(cty_old_20201109, ovrWriteError=FALSE)
## 
## File already exists: ./RInputFiles/Coronavirus/cty_old_20201109.RDS 
## 
## Not replacing the existing file since ovrWrite=FALSE
## NULL

Cases appear to be growing in many of the county clusters, though deaths per million per day remain well below the peaks observed in April in the earlier-hit counties.

Next, all-cause death data from the CDC are downloaded and assessed:

# Use data that have previously been downloaded
cdcLoc <- "Weekly_counts_of_deaths_by_jurisdiction_and_age_group_downloaded_20201110.csv"
cdcList_20201110 <- readRunCDCAllCause(loc=cdcLoc, 
                                       startYear=2015, 
                                       curYear=2020,
                                       weekThru=36, 
                                       startWeek=9, 
                                       lst=readFromRDS("test_old_201108"), 
                                       epiMap=readFromRDS("epiMonth"), 
                                       agePopData=readFromRDS("usPopBucket2020"), 
                                       cvDeathThru="2020-09-05", 
                                       cdcPlotStartWeek=10, 
                                       dlData=!file.exists(paste0("./RInputFiles/Coronavirus/", cdcLoc))
                                       )
## Rows: 178,482
## Columns: 11
## $ Jurisdiction         <chr> "Alabama", "Alabama", "Alabama", "Alabama", "A...
## $ `Week Ending Date`   <chr> "01/10/2015", "01/17/2015", "01/24/2015", "01/...
## $ `State Abbreviation` <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"...
## $ Year                 <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015...
## $ Week                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,...
## $ `Age Group`          <chr> "25-44 years", "25-44 years", "25-44 years", "...
## $ `Number of Deaths`   <dbl> 67, 49, 55, 59, 47, 59, 41, 47, 59, 57, 54, 50...
## $ `Time Period`        <chr> "2015-2019", "2015-2019", "2015-2019", "2015-2...
## $ Type                 <chr> "Predicted (weighted)", "Predicted (weighted)"...
## $ Suppress             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ Note                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## Rows: 178,482
## Columns: 11
## $ Jurisdiction <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama",...
## $ weekEnding   <date> 2015-01-10, 2015-01-17, 2015-01-24, 2015-01-31, 2015-...
## $ state        <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", ...
## $ year         <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, ...
## $ week         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
## $ age          <chr> "25-44 years", "25-44 years", "25-44 years", "25-44 ye...
## $ deaths       <dbl> 67, 49, 55, 59, 47, 59, 41, 47, 59, 57, 54, 50, 58, 42...
## $ period       <chr> "2015-2019", "2015-2019", "2015-2019", "2015-2019", "2...
## $ type         <chr> "Predicted (weighted)", "Predicted (weighted)", "Predi...
## $ Suppress     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ Note         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## 
## Check Control Levels and Record Counts for Renamed Data:
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 6 x 4
##   age                    n n_deaths_na   deaths
##   <chr>              <int>       <int>    <dbl>
## 1 25-44 years        26725           5  3266337
## 2 45-64 years        32640          10 12792834
## 3 65-74 years        32632          12 12691956
## 4 75-84 years        32652          14 15776825
## 5 85 years and older 32640          15 20567595
## 6 Under 25 years     21193           0  1407550
## `summarise()` regrouping output by 'period', 'year' (override with `.groups` argument)
## # A tibble: 12 x 6
## # Groups:   period, year [6]
##    period     year type                     n n_deaths_na  deaths
##    <chr>     <int> <chr>                <int>       <int>   <dbl>
##  1 2015-2019  2015 Predicted (weighted) 15285           0 5416393
##  2 2015-2019  2015 Unweighted           15285           0 5416393
##  3 2015-2019  2016 Predicted (weighted) 15365           0 5483764
##  4 2015-2019  2016 Unweighted           15365           0 5483764
##  5 2015-2019  2017 Predicted (weighted) 15317           0 5643342
##  6 2015-2019  2017 Unweighted           15317           0 5643342
##  7 2015-2019  2018 Predicted (weighted) 15305           0 5698005
##  8 2015-2019  2018 Unweighted           15305           0 5698005
##  9 2015-2019  2019 Predicted (weighted) 15319           0 5725544
## 10 2015-2019  2019 Unweighted           15319           0 5725544
## 11 2020       2020 Predicted (weighted) 12677          34 5332986
## 12 2020       2020 Unweighted           12623          22 5236015
## `summarise()` regrouping output by 'period' (override with `.groups` argument)
## # A tibble: 3 x 5
## # Groups:   period [2]
##   period    Suppress                                       n n_deaths_na  deaths
##   <chr>     <chr>                                      <int>       <int>   <dbl>
## 1 2015-2019 <NA>                                      153182           0  5.59e7
## 2 2020      Suppressed (counts highly incomplete, <5~     56          56  0.    
## 3 2020      <NA>                                       25244           0  1.06e7
## `summarise()` regrouping output by 'period' (override with `.groups` argument)
## # A tibble: 9 x 5
## # Groups:   period [2]
##   period   Note                                            n n_deaths_na  deaths
##   <chr>    <chr>                                       <int>       <int>   <dbl>
## 1 2015-20~ <NA>                                       153182           0  5.59e7
## 2 2020     Data in recent weeks are incomplete. Only~  19873          10  8.61e6
## 3 2020     Data in recent weeks are incomplete. Only~    456           0  2.12e5
## 4 2020     Data in recent weeks are incomplete. Only~    384          35  4.69e4
## 5 2020     Data in recent weeks are incomplete. Only~   2213          11  7.28e5
## 6 2020     Data in recent weeks are incomplete. Only~     12           0  7.12e3
## 7 2020     Estimates for Pennsylvania are too low fo~     48           0  2.26e4
## 8 2020     Weights may be too low to account for und~    328           0  1.25e5
## 9 2020     <NA>                                         1986           0  8.13e5
## `summarise()` regrouping output by 'state' (override with `.groups` argument)
##    state         Jurisdiction    n n_deaths_na   deaths
## 1     US        United States 3636           0 33131804
## 2     CA           California 3636           0  3152950
## 3     FL              Florida 3636           0  2423089
## 4     TX                Texas 3636           0  2383305
## 5     PA         Pennsylvania 3636           0  1589393
## 6     OH                 Ohio 3636           0  1432144
## 7     IL             Illinois 3636           0  1250417
## 8     NY             New York 3636           0  1183446
## 9     MI             Michigan 3636           0  1140202
## 10    NC       North Carolina 3568          28  1066679
## 11    GA              Georgia 3636           0   994220
## 12    NJ           New Jersey 3630           0   886576
## 13    TN            Tennessee 3636           0   865673
## 14    VA             Virginia 3636           0   795842
## 15    IN              Indiana 3632           0   769795
## 16    MO             Missouri 3633           0   749309
## 17    AZ              Arizona 3636           0   702644
## 18    MA        Massachusetts 3602           0   701039
## 19    YC        New York City 3632           0   685046
## 20    WA           Washington 3636           0   662228
## 21    AL              Alabama 3635           0   615046
## 22    WI            Wisconsin 3618           0   608567
## 23    MD             Maryland 3630           0   585192
## 24    SC       South Carolina 3634           0   576703
## 25    KY             Kentucky 3599           0   560150
## 26    LA            Louisiana 3628           0   541388
## 27    MN            Minnesota 3592           0   517087
## 28    CO             Colorado 3634           0   458526
## 29    OK             Oklahoma 3622           9   455633
## 30    OR               Oregon 3466           0   424398
## 31    MS          Mississippi 3574           0   374274
## 32    AR             Arkansas 3532           0   372266
## 33    CT          Connecticut 3191          14   365303
## 34    IA                 Iowa 3271           0   349432
## 35    PR          Puerto Rico 3347           0   340685
## 36    KS               Kansas 3327           0   304875
## 37    NV               Nevada 3372           0   297386
## 38    WV        West Virginia 3075           2   257638
## 39    UT                 Utah 3522           0   219851
## 40    NM           New Mexico 3210           0   210537
## 41    NE             Nebraska 2920           0   193764
## 42    ME                Maine 2712           0   164439
## 43    ID                Idaho 2834           0   157266
## 44    NH        New Hampshire 2734           0   138417
## 45    HI               Hawaii 2625           0   127727
## 46    RI         Rhode Island 2534           3   116993
## 47    MT              Montana 2618           0   113028
## 48    DE             Delaware 2627           0   102300
## 49    SD         South Dakota 2510           0    89238
## 50    ND         North Dakota 2496           0    77815
## 51    DC District of Columbia 2609           0    65724
## 52    VT              Vermont 2392           0    63461
## 53    WY              Wyoming 2379           0    48740
## 54    AK               Alaska 2412           0    43447
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## Rows: 178,482
## Columns: 11
## $ Jurisdiction <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama",...
## $ weekEnding   <date> 2015-01-10, 2015-01-17, 2015-01-24, 2015-01-31, 2015-...
## $ state        <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", ...
## $ year         <fct> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, ...
## $ week         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
## $ age          <fct> 25-44 years, 25-44 years, 25-44 years, 25-44 years, 25...
## $ deaths       <dbl> 67, 49, 55, 59, 47, 59, 41, 47, 59, 57, 54, 50, 58, 42...
## $ period       <fct> 2015-2019, 2015-2019, 2015-2019, 2015-2019, 2015-2019,...
## $ type         <chr> "Predicted (weighted)", "Predicted (weighted)", "Predi...
## $ Suppress     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ Note         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## Rows: 87,264
## Columns: 11
## $ Jurisdiction <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama",...
## $ weekEnding   <date> 2015-01-10, 2015-01-17, 2015-01-24, 2015-01-31, 2015-...
## $ state        <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", ...
## $ year         <fct> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, ...
## $ week         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
## $ age          <fct> 25-44 years, 25-44 years, 25-44 years, 25-44 years, 25...
## $ deaths       <dbl> 67, 49, 55, 59, 47, 59, 41, 47, 59, 57, 54, 50, 58, 42...
## $ period       <fct> 2015-2019, 2015-2019, 2015-2019, 2015-2019, 2015-2019,...
## $ type         <chr> "Predicted (weighted)", "Predicted (weighted)", "Predi...
## $ Suppress     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ Note         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## 
## 
##  *** Data suppression checks *** 
## # A tibble: 5 x 11
##   Jurisdiction weekEnding state year   week age   deaths period type  Suppress
##   <chr>        <date>     <chr> <fct> <int> <fct>  <dbl> <fct>  <chr> <chr>   
## 1 North Carol~ 2020-09-05 NC    2020     36 25-4~     NA 2020   Pred~ Suppres~
## 2 North Carol~ 2020-09-05 NC    2020     36 45-6~     NA 2020   Pred~ Suppres~
## 3 North Carol~ 2020-09-05 NC    2020     36 65-7~     NA 2020   Pred~ Suppres~
## 4 North Carol~ 2020-09-05 NC    2020     36 75-8~     NA 2020   Pred~ Suppres~
## 5 North Carol~ 2020-09-05 NC    2020     36 85 y~     NA 2020   Pred~ Suppres~
## # ... with 1 more variable: Note <chr>
## 
## Data suppression checks OK - 5 records in current week/year suppressed
## `summarise()` regrouping output by 'Jurisdiction', 'weekEnding', 'state', 'year', 'week', 'age', 'period', 'type' (override with `.groups` argument)
## Rows: 82,069
## Columns: 12
## $ Jurisdiction <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama",...
## $ weekEnding   <date> 2015-01-10, 2015-01-10, 2015-01-10, 2015-01-10, 2015-...
## $ state        <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", ...
## $ year         <fct> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, ...
## $ week         <int> 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, ...
## $ age          <fct> Under 25 years, 25-44 years, 45-64 years, 65-74 years,...
## $ period       <fct> 2015-2019, 2015-2019, 2015-2019, 2015-2019, 2015-2019,...
## $ type         <chr> "Predicted (weighted)", "Predicted (weighted)", "Predi...
## $ Suppress     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ n            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ deaths       <dbl> 25, 67, 253, 202, 272, 320, 28, 49, 256, 222, 253, 332...
## $ Note         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## 
## First duplicate is in row number (0 means no duplicates): 0
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` regrouping output by 'cluster' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year', 'week' (override with `.groups` argument)

## `summarise()` regrouping output by 'year', 'week' (override with `.groups` argument)

## `summarise()` regrouping output by 'year', 'age', 'week' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `summarise()` regrouping output by 'state', 'quarter', 'month' (override with `.groups` argument)
## `summarise()` regrouping output by 'state' (override with `.groups` argument)

## `summarise()` regrouping output by 'state' (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)
## Joining, by = "state"

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'year' (override with `.groups` argument)

## `summarise()` regrouping output by 'age', 'quarter', 'month' (override with `.groups` argument)
## `summarise()` regrouping output by 'age' (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

saveToRDS(cdcList_20201110, ovrWriteError=FALSE)
## 
## File already exists: ./RInputFiles/Coronavirus/cdcList_20201110.RDS 
## 
## Not replacing the existing file since ovrWrite=FALSE
## NULL

The death data can then be combined and analyzed, using the function from Coronavirus_Statistics_States_v003:

combineDeathData <- function(ctp, 
                             usaf, 
                             cdc, 
                             keyState, 
                             curYear=2020,
                             minDate=as.Date(paste0(curYear, "-01-01")), 
                             perMillion=FALSE,
                             glimpseIntermediate=FALSE, 
                             facetFreeY=!perMillion, 
                             returnData=TRUE
                             ) {
    
    # FUNCTION ARGUMENTS:
    # ctp: the list with COVID Tracking Project data
    # usaf: the data frame with USA Facts data
    # cdc: the list with CDC data
    # keyState: the state(s) to be explored
    # curYear: current year
    # minDate: the minimum date to use in the CDC data
    # perMillion: boolean, should data be show on a per-million-people basis?
    # glimpseIntermediate: boolean, should glimpses of frames be provided as they are built?
    # facetFreeY: boolean, should facets be created with free_y scales (only relevant if 2+ keyStates)
    # returnData: boolean, should the data frame be returned?
    
    # STEP 0a: Extract relevant elements from lists (use frame as-is if directly passed)
    if ("list" %in% class(ctp)) ctp <- ctp[["consolidatedPlotData"]]
    if ("list" %in% class(usaf)) usaf <- usaf[["clusterStateData"]]
    if ("list" %in% class(cdc)) cdc <- cdc[["stateAgg"]]
    
    # STEP 0b: Create a mapping file of date to epiWeek
    epiMap <- tibble::tibble(date=seq.Date(from=minDate, to=as.Date(paste0(curYear, "-12-31")), by="1 day"), 
                             week=lubridate::epiweek(date)
                             )
    
    # STEP 1: Filter to only relevant data
    # STEP 1a: COVID Tracking Project
    ctp <- ctp %>%
        ungroup() %>%
        filter(name=="deaths", state %in% keyState)
    if(glimpseIntermediate) glimpse(ctp)
    # STEP 1b: USA Facts
    usaf <- usaf %>%
        ungroup() %>%
        filter(state %in% keyState)
    if(glimpseIntermediate) glimpse(usaf)
    # STEP 1c: CDC
    cdc <- cdc %>%
        ungroup() %>%
        filter(year==curYear, state %in% keyState)
    if(glimpseIntermediate) glimpse(cdc)
    
    # STEP 2a: Sum the county-level data so that it is state-level data
    usafState <- usaf %>%
        group_by(state, date) %>%
        summarize(deaths=sum(deaths), dpm7=sum(dpm7*pop)/sum(pop), pop=sum(pop)) %>%
        ungroup()
    # STEP 2b: Convert the CDC data to an estimated daily total (split the weekly total evenly)
    cdcDaily <- cdc %>%
        left_join(epiMap, by=c("week")) %>%
        select(state, week, date, cdcDeaths=deaths, cdcExcess=delta) %>%
        mutate(cdcDeaths=cdcDeaths/7, cdcExcess=cdcExcess/7)
    
    # STEP 3: Create a state death-level database by date
    dailyDeath <- select(ctp, state, date, ctpDeaths=value, ctp_dpm7=vpm7, ctp_pop=pop) %>%
        full_join(select(usafState, state, date, usafDeaths=deaths, usaf_dpm7=dpm7, usaf_pop=pop), 
                  by=c("state", "date")
                  ) %>%
        full_join(cdcDaily, by=c("state", "date")) %>%
        arrange(state, date) %>%
        mutate(ctp_death7=ctp_dpm7*ctp_pop/1000000, usaf_death7=usaf_dpm7*usaf_pop/1000000)
    if(glimpseIntermediate) glimpse(dailyDeath)

    # STEP 4a: Assign a population by state
    statePop <- dailyDeath %>%
        group_by(state) %>%
        summarize(pop=max(usaf_pop, ctp_pop, na.rm=TRUE))
    
    # STEP 4b: Plot the deaths data
    p1 <- dailyDeath %>%
        select(state, date, ctp_death7, usaf_death7, cdcExcess) %>%
        pivot_longer(-c(state, date), names_to="source", values_to="deaths") %>%
        filter(!is.na(deaths)) %>%
        left_join(statePop, by="state") %>%
        ggplot(aes(x=date, y=deaths*if(perMillion) (1000000/pop) else 1)) + 
        geom_line(aes(group=source, color=varMapper[source])) + 
        labs(x="", 
             y=paste0("Deaths", if(perMillion) " per million" else ""), 
             title=paste0(curYear, " deaths per day in ", paste0(keyState, collapse=", ")),
             subtitle=paste0("Rolling 7-day average", if(perMillion) " per million people" else ""),
             caption="CDC estimated excess all-cause deaths, weekly total divided by 7 to estimate daily total"
             ) + 
        scale_x_date(date_breaks="1 month", date_labels="%b") + 
        scale_color_discrete("Data source") + 
        theme(legend.position="bottom") + 
        geom_hline(yintercept=0, lty=2)
    if (length(keyState) > 1) p1 <- p1 + facet_wrap(~state, scales=if(facetFreeY) "free_y" else "fixed")
    print(p1)
    
    # STEP 5: Return the daily death file
    if(returnData) dailyDeath
    
}


# Example function
combineDeathData(ctp=test_old_201108, 
                 usaf=cty_old_20201109$clusterStateData, 
                 cdc=cdcList_20201110, 
                 keyState=c("NY", "NJ", "MA", "FL", "GA", "TX", "AZ", "MS", "LA", "MI", "IL", "WI"), 
                 perMillion=FALSE, 
                 returnData=FALSE
                 )
## `summarise()` regrouping output by 'state' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

combineDeathData(ctp=test_old_201108, 
                 usaf=cty_old_20201109$clusterStateData, 
                 cdc=cdcList_20201110, 
                 keyState=c("NY", "NJ", "MA", "FL", "GA", "TX", "AZ", "MS", "LA", "MI", "IL", "WI"), 
                 perMillion=TRUE, 
                 returnData=FALSE
                 )
## `summarise()` regrouping output by 'state' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

In general, the shapes of the curves are well aligned, though the CDC excess-deaths generally start earlier and peak higher than the coronavirus deaths curves from COVID Tracking Project and USA Facts. In general, the northeastern states hit early have a classic epidemic peak curve, while the states hit later tend to have lower peaks (per million) but, in some cases, a much extended timeline of non-zero disease impact.

Next, an integrated state data file is created from the latest data:

ctpList <- readFromRDS("test_old_201108")
usafData <- readFromRDS("cty_old_20201109")$clusterStateData
cdcList <- readFromRDS("cdcList_20201110")

# Function to convert a COVID Tracking Project file for further processing
prepCTPData <- function(ctp) {
    
    # FUNCTION AGRUMENTS:
    # ctp: a properly formatted list or data frame containing processed COVID Tracking Project data

    # Pull the relevant data frame if a list has been passed    
    if ("list" %in% class(ctp)) ctp <- ctp[["consolidatedPlotData"]]

    # Ungroup the data, delete the state named 'cluster', and Create a value7 metric
    ctp <- ctp %>%
        ungroup() %>%
        filter(state != "cluster") %>%
        mutate(value7=ifelse(is.na(vpm7), NA, vpm7*pop/1000000))
    
    # Split state-cluster-population as a separate file unique by state
    ctpDemo <- ctp %>%
        group_by(state, cluster) %>%
        summarize(pop=max(pop, na.rm=TRUE)) %>%
        ungroup()
    
    # Create a final data file with the key elements
    ctpData <- ctp %>%
        rename(metric=name) %>%
        mutate(source="CTP", name=paste0(source, "_", metric)) %>%
        select(state, date, metric, source, name, value, value7, vpm, vpm7)
    
    # Return the key data frames
    list(ctpDemo=ctpDemo, ctpData=ctpData)
    
}

ctpPrepped <- prepCTPData(ctpList)
## `summarise()` regrouping output by 'state' (override with `.groups` argument)
# Function to convert a USA Facts file for further processing
prepUSAFData <- function(usaf) {
    
    # FUNCTION AGRUMENTS:
    # usaf: a properly formatted list or data frame containing processed USA Facts data

    # Pull the relevant data frame if a list has been passed    
    if ("list" %in% class(usaf)) usaf <- usaf[["clusterStateData"]]

    # Sum the data to state, keeping only state-date-pop-cases-deaths, then pivot longer
    usaf <- usaf %>%
        group_by(state, date) %>%
        summarize(cases=sum(cases), deaths=sum(deaths), pop=sum(pop)) %>%
        ungroup() %>%
        pivot_longer(-c(state, date, pop), names_to="metric", values_to="value")
    
    # Create the rolling-7 for value, having grouped by state-pop-metric and sorted by date
    # Add the per million component
    usaf <- usaf %>%
        group_by(state, pop, metric) %>%
        arrange(date) %>%
        helperRollingAgg(origVar="value", newName="value7") %>%
        ungroup() %>%
        mutate(vpm=value*1000000/pop, vpm7=value7*1000000/pop)
    
    # Split state-pop as a separate file unique by state
    usafDemo <- usaf %>%
        group_by(state) %>%
        summarize(pop=max(pop, na.rm=TRUE)) %>%
        ungroup()
    
    # Create a final data file with the key elements
    usafData <- usaf %>%
        mutate(source="USAF", name=paste0(source, "_", metric)) %>%
        select(state, date, metric, source, name, value, value7, vpm, vpm7)
    
    # Return the key data frames
    list(usafDemo=usafDemo, usafData=usafData)
    
}

usafPrepped <- prepUSAFData(usafData)
## `summarise()` regrouping output by 'state' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
# Function to convert a CDC file for further processing
prepCDCData <- function(cdc, 
                        popData,
                        startYear=2020, 
                        startDate=as.Date(paste0(startYear, "-01-01")), 
                        endDate=as.Date(paste0(startYear, "-12-31"))
                        ) {
    
    # FUNCTION AGRUMENTS:
    # cdc: a properly formatted list or data frame containing processed CDC data
    # popData: a file containing fields state-pop
    # startYear: starting year (CDC data will be filtered for this year and later)
    # startDate: the starting date for use in the mapping file to create daily estimates
    # endDate: the ending date for use in the mapping file to create daily estimates

    # Pull the relevant data frame if a list has been passed    
    if ("list" %in% class(cdc)) cdc <- cdc[["stateAgg"]]

    # Create a mapping file of dates to epiweek-epiyear
    epiMap <- tibble::tibble(date=seq.Date(from=startDate, to=endDate, by="1 day"), 
                             year=lubridate::epiyear(date),
                             week=lubridate::epiweek(date)
                             )
    
    # Filter the data to the relevant year and keep state-year-week-deaths-excess
    cdc <- cdc %>%
        filter(yearint >= startYear) %>%
        select(state, yearint, week, deaths, excess=delta)

    # Merge in the daily mapping file, divide all totals by 7 to reflect weekly to daily, and pivot longer
    cdc <- cdc %>%
        left_join(epiMap, by=c("yearint"="year", "week"="week")) %>%
        mutate(deaths=deaths/7, excess=excess/7) %>%
        select(state, date, deaths, excess) %>%
        pivot_longer(-c(state, date), names_to="metric", values_to="value")
    
    # Create the rolling-7 for value, having grouped by state-metric and sorted by date
    # Add the per million component
    cdc <- cdc %>%
        group_by(state, metric) %>%
        arrange(date) %>%
        helperRollingAgg(origVar="value", newName="value7") %>%
        ungroup() %>%
        left_join(select(popData, state, pop), by="state") %>%
        mutate(vpm=value*1000000/pop, 
               vpm7=value7*1000000/pop, 
               source="CDC", 
               name=paste0(source, "_", metric)
               ) %>%
        select(state, date, pop, metric, source, name, value, value7, vpm, vpm7)
    
    # Return the key data frame as a list
    list(cdcDemo=select(cdc, state, pop), cdcData=select(cdc, -pop))
    
}

# Create an integrated state demographics file
demoData <- ctpPrepped$ctpDemo %>%
    rename(popCTP=pop) %>%
    full_join(rename(usafPrepped$usafDemo, popUSAF=pop), by="state") %>%
    mutate(pop=pmax(popCTP, popUSAF))

cdcPrepped <- prepCDCData(cdcList, popData=demoData)



# Integrated state data
stateData <- ctpPrepped$ctpData %>%
    bind_rows(usafPrepped$usafData) %>%
    bind_rows(cdcPrepped$cdcData)
glimpse(stateData)
## Rows: 104,103
## Columns: 9
## $ state  <chr> "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", ...
## $ date   <date> 2020-03-06, 2020-03-07, 2020-03-08, 2020-03-09, 2020-03-10,...
## $ metric <chr> "cases", "cases", "cases", "cases", "cases", "cases", "cases...
## $ source <chr> "CTP", "CTP", "CTP", "CTP", "CTP", "CTP", "CTP", "CTP", "CTP...
## $ name   <chr> "CTP_cases", "CTP_cases", "CTP_cases", "CTP_cases", "CTP_cas...
## $ value  <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 3, 0, 6, 2, 8, 0, 14, 6,...
## $ value7 <dbl> NA, NA, NA, 0.0000000, 0.1428571, 0.1428571, 0.1428571, 0.14...
## $ vpm    <dbl> 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ...
## $ vpm7   <dbl> NA, NA, NA, 0.0000000, 0.1934601, 0.1934601, 0.1934601, 0.19...
# Control totals
stateData %>%
    group_by(name) %>%
    summarize(value=sum(value, na.rm=TRUE), value7=sum(value7, na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 8 x 3
##   name             value     value7
##   <chr>            <dbl>      <dbl>
## 1 CDC_deaths    2224832.   2173735.
## 2 CDC_excess     242466.    240886.
## 3 CTP_cases     9717214    9372412.
## 4 CTP_deaths     228262     224869.
## 5 CTP_hosp      9301372    9036822.
## 6 CTP_tests   154892254  150735326.
## 7 USAF_cases    9679664    9336904.
## 8 USAF_deaths    233286     230182.

The alignment of cases and deaths data can then be plotted:

plotStateMetric <- function(df, 
                            yVal, 
                            namesPlot, 
                            keyStates, 
                            namesSec=NULL,
                            scaleSec=NULL,
                            plotTitle=NULL,
                            plotSub=NULL,
                            plotCaption=NULL,
                            primYLab=NULL,
                            secYLab="Caution, different metric and scale",
                            facetFixed=TRUE,
                            mapper=varMapper, 
                            combStates=vector("character", 0), 
                            popData=NULL, 
                            yValPerCap=(yVal %in% c("vpm", "vpm7")), 
                            printPlot=TRUE, 
                            returnData=FALSE
                            ) {
    
    # FUNCTION ARGUMENTS:
    # df: data frame with integrated state data
    # yVal: column to use for the yValues
    # namesPlot: the values of column 'name' to be kept and plotted
    # keyStates: states to be included
    #            if more than one state is passed, facets will be created
    # namesSec: names to be plotted on a secondary y-axes
    # scaleSec: scale to be used for the secondary axis 
    #           namesSec/scaleSec should be similar in magnitude to namesPlot
    # plotTitle: plot title to be used (NULL means none)
    # plotSub: plot subtitle to be used (NULL means none)
    # plotCaption: plot caption to be used (NULL means none)
    # primYLab: primary y label (NULL means use mapper)
    # secYLab: secondary y label (default is "Caution, different metric and scale")
    # facetFixed: boolean, if TRUE scales="fixed", if FALSE scales="free_y"
    #             only relevant if length(keyStates) > 1
    # mapper: mapping file for variable names to descriptive names
    # combStates: states that should be combined together for plotting (named vector, c("state"="newName"))
    # popData: a population file for combining states
    # yValPerCap: boolean, is the y-value of type per-capita?
    # printPlot: boolean, whether to print the plots
    # returnData: boolean, whether to return the data
    
    # Routine is only set up for a secondary axis with facetFixed=TRUE
    if (!is.null(namesSec) & !facetFixed) stop("\nSecondary axis only programmed for scales='fixed'\n")
    
    # Include variables in namesSec as part of namesPlot so they are kept by filter
    if (!is.null(namesSec)) namesPlot <- unique(c(namesPlot, namesSec))
    
    # Filter the data for only the key elements
    df <- df %>%
        select_at(vars(all_of(c("state", "date", "name", yVal)))) %>%
        filter(state %in% keyStates, name %in% namesPlot)
    
    # If there is a list of states to combine, process them
    if (length(combStates) > 0) {
        if (is.null(popData)) { stop("\nCombining states requires population data\n") }
        # Create a data frame with population and new state names
        df <- df %>%
            left_join(select(popData, state, pop), by="state") %>%
            mutate(state=ifelse(state %in% names(combStates), combStates[state], state))
        # Aggregate to the new 'state' level data
        if (yValPerCap) {
            df <- df %>%
                group_by(state, date, name) %>%
                summarize(!!yVal:=sum(get(yVal)*pop)/sum(pop), pop=sum(pop))
        } else {
            df <- df %>%
                group_by(state, date, name) %>%
                summarize(!!yVal:=sum(get(yVal)), pop=sum(pop))
        }
        # Ungroup data frame
        df <- df %>%
            ungroup()
    }
    
    # If there is a secondary scale but no scaleSec has been passed, create one
    if (!is.null(namesSec) & is.null(scaleSec)) {
        maxPrimary <- df %>%
            filter(name %in% setdiff(namesPlot, namesSec)) %>%
            summarize(max(get(yVal), na.rm=TRUE)) %>%
            max()
        maxSecondary <- df %>%
            filter(name %in% namesSec) %>%
            summarize(max(get(yVal), na.rm=TRUE)) %>%
            max()
        scaleSec <- maxSecondary/maxPrimary
        cat("\nWill scale by:", scaleSec, "\n")
    }
    
    # Create the primary y-axis label from mapper if it has not been passed
    if (is.null(primYLab)) primYLab <- mapper[yVal]
    
    # Create the relevant line plot
    if (printPlot) {
        p1 <- df %>%
            filter(!is.na(get(yVal))) %>%
            ggplot(aes_string(x="date")) + 
            geom_line(data=~filter(., name %in% setdiff(namesPlot, namesSec)), 
                      aes(y=get(yVal), group=name, color=mapper[name])
                      ) + 
            scale_x_date(date_breaks="1 month", date_labels="%b") + 
            geom_hline(aes(yintercept=0), lty=2) + 
            labs(x="") + 
            theme(axis.text.x = element_text(angle = 90))
        if (!is.null(namesSec)) {
            p1 <- p1 + 
                geom_line(data=~filter(., name %in% namesSec), 
                          aes(y=get(yVal)/scaleSec, color=mapper[name], group=name)
                          ) + 
                scale_y_continuous(name=primYLab, 
                                   sec.axis=sec_axis(~.*scaleSec, name=secYLab)
                                   )
        } else {
            p1 <- p1 + scale_y_continuous(name=primYLab)
        }
        if (length(keyStates) > 1) p1 <- p1 + facet_wrap(~state, scales=if(facetFixed) "fixed" else "free_y")
        if (!is.null(plotTitle)) p1 <- p1 + labs(title=plotTitle)
        if (!is.null(plotSub)) p1 <- p1 + labs(subtitle=plotSub)
        if (!is.null(plotCaption)) p1 <- p1 + labs(caption=plotCaption)
        p1 <- p1 + scale_color_discrete("Source and metric")
        print(p1)
    }
    
    if (returnData) return(df)
    
}

# Example of combining states
ne_casedeath <- plotStateMetric(stateData, 
                                yVal="vpm7", 
                                namesPlot=c("CTP_cases"),
                                namesSec=c("CTP_deaths"), 
                                keyStates=c("NY", "NJ", "MA", "CT", "RI", "NH", "VT", "ME"),
                                combStates=c("MA"="S NE", "CT"="S NE", "RI"="S NE", 
                                             "NH"="N NE", "VT"="N NE", "ME"="N NE"
                                             ),
                                plotTitle="2020 coronavirus burden per million per day (select states)", 
                                plotSub="Cases on main y-axis, deaths on secondary y-axis", 
                                primYLab="Cases per million (7-day rolling mean)",
                                secYLab="Deaths per million (7-day rolling mean)",
                                facetFixed=TRUE, 
                                popData=usafPrepped$usafDemo,
                                returnData=TRUE
                                )
## `summarise()` regrouping output by 'state', 'date' (override with `.groups` argument)
## 
## Will scale by: 0.07901078

ne_casedeath
## # A tibble: 2,124 x 5
##    state date       name        vpm7     pop
##    <chr> <date>     <chr>      <dbl>   <dbl>
##  1 N NE  2020-03-03 CTP_cases     NA  626042
##  2 N NE  2020-03-03 CTP_deaths    NA  626042
##  3 N NE  2020-03-04 CTP_cases     NA 1956650
##  4 N NE  2020-03-04 CTP_deaths    NA 1956650
##  5 N NE  2020-03-05 CTP_cases     NA 1956650
##  6 N NE  2020-03-05 CTP_deaths    NA 1956650
##  7 N NE  2020-03-06 CTP_cases     NA 1956650
##  8 N NE  2020-03-06 CTP_deaths    NA 1956650
##  9 N NE  2020-03-07 CTP_cases     NA 3285978
## 10 N NE  2020-03-07 CTP_deaths    NA 3285978
## # ... with 2,114 more rows

An attempt is made to align the curves for two different metrics in a single locale:

alignCurves <- function(df, 
                        valueMetric, 
                        depName,
                        indepName=setdiff(unique(df$name), depName),
                        lagsTry=0:30, 
                        yLabel="Deaths per million", 
                        depLabel="cases",
                        textMetric=stringr::str_split(yLabel, pattern=" ")[[1]][1] %>% stringr::str_to_lower()
                        ) {
    
    # FUNCTION ARGUMENTS
    # df: a data frame containing state-date-name-valueMetric, with only 2 value types in 'name'
    # valueMetric: the name of the value metric
    # depName: the name of the dependent variable (the other will be the predictor)
    # indepName: the name of the predictor variable
    # lagsTry: the lagged values to attempt
    # yLabel: label for the y-axis
    # depLabel: label for the title (regression x-variable name)
    # textMetric: label for the title (regression y-variable name)
    
    # Check that there are only two values in column 'name'
    if (length(unique(df$name))!=2) { stop("\nFunction depends on 'name' having only two possible values\n") }
    
    # Arrange the data by state and date
    df <- df %>%
        arrange(state, date)
    
    # Function to make a data frame with a specific lag
    helperMakeLagData <- function(df, depName, indepName, valueMetric, lagValue) {
        depData <- df %>%
            filter(name==depName) %>%
            select_at(vars(all_of(c("state", "date", valueMetric)))) %>%
            purrr::set_names(c("state", "date", "depVar"))
        indepData <- df %>%
            filter(name==indepName) %>%
            group_by(state) %>%
            mutate(indepVar=lag(get(valueMetric), lagValue)) %>%
            ungroup() %>%
            select(state, date, indepVar)
        fullData <- depData %>%
            full_join(indepData, by=c("state", "date"))
        fullData
    }
    
    # Run a simple linear model for depName ~ lag(otherName, lagsTry) to assess performance
    lmResults <- vector("list", length(lagsTry))
    n <- 1
    for (lagValue in lagsTry) {
        # Run the linear model with no intercept, save, and increment
        lmResults[[n]] <- lm(depVar ~ indepVar:state + 0, 
                             data=helperMakeLagData(df, 
                                                    depName=depName, 
                                                    indepName=indepName, 
                                                    valueMetric=valueMetric, 
                                                    lagValue=lagValue
                                                    )
                             )
        n <- n + 1
    }
    
    # Find the best lag and coefficients
    dfResults <- tibble::tibble(lags=lagsTry, 
                                rsq=sapply(lmResults, FUN=function(x) summary(x)$r.squared)
                                )
    p1 <- dfResults %>%
        ggplot(aes(x=lags, y=rsq)) + 
        geom_point() + 
        labs(x="Lag", y="R-squared", title="R-squared vs. lag for aligning curves")
    print(p1)
    
    # Calculate the best lag and coefficients
    bestLag <- dfResults %>%
        filter(rsq==max(rsq)) %>%
        pull(lags)
    bestCoef <- coef(lmResults[[which(lagsTry==bestLag)]]) %>%
        as.data.frame() %>% 
        purrr::set_names("mult") %>%
        tibble::rownames_to_column("state") %>%
        mutate(state=str_replace(state, "indepVar:state", ""))
    
    # Plot the curves using the coefficients and lags
    bestDF <- helperMakeLagData(df, 
                                depName=depName, 
                                indepName=indepName, 
                                valueMetric=valueMetric, 
                                lagValue=bestLag
                                ) %>%
        filter(!is.na(indepVar)) %>%
        left_join(bestCoef, by="state") %>%
        mutate(pred=mult*indepVar)
    p2 <- bestDF %>%
        select(state, date, depVar, pred, mult) %>%
        pivot_longer(-c(state, date, mult)) %>%
        mutate(name=case_when(name=="depVar" ~ "Actual value", 
                              name=="pred" ~ "Predicted value\n(lag, mult)", 
                              TRUE ~ "Unknown Element"
                              )
               ) %>%
        ggplot(aes(x=date, y=value)) + 
        geom_line(aes(group=name, color=name)) + 
        geom_text(data=~filter(., date==max(date)), 
                  aes(x=date, y=+Inf, label=paste0("Multiplier: ", round(mult, 3))), 
                  hjust=1, 
                  vjust=1
                  ) +
        labs(x="", 
             y=yLabel, 
             title=paste0("Predicting ", 
                          textMetric, 
                          " based on lagged ", 
                          depLabel, 
                          " (best lag: ", 
                          bestLag, 
                          " days)"
                          ),
             subtitle="Best lag is based on highest correlation/R-squared, common across all facets"
             ) + 
        facet_wrap(~state) + 
        scale_x_date(date_breaks="1 month", date_labels="%b") + 
        theme(axis.text.x = element_text(angle = 90)) + 
        scale_color_discrete("Metric")
    print(p2)
    
    # Return the key data
    list(bestLag=bestLag, bestCoef=bestCoef, bestDF=bestDF, lmResults=lmResults)
    
}



createAndAlignCurves <- function(df, 
                                 yVal, 
                                 namesPlot, 
                                 keyStates,
                                 lagValueMetric,
                                 lagDepName,
                                 namesSec=NULL,
                                 scaleSec=NULL,
                                 plotTitle=NULL,
                                 plotSub=NULL,
                                 plotCaption=NULL,
                                 primYLab=NULL,
                                 secYLab="Caution, different metric and scale",
                                 facetFixed=TRUE,
                                 mapper=varMapper, 
                                 combStates=vector("character", 0), 
                                 popData=NULL, 
                                 yValPerCap = (yVal %in% c("vpm", "vpm7")), 
                                 printPlot=TRUE, 
                                 ...
                                 ) {
    
    # FUNCTION ARGUMENTS:
    # df: data frame with integrated state data
    # yVal: column to use for the yValues
    # namesPlot: the values of column 'name' to be kept and plotted
    # keyStates: states to be included
    #            if more than one state is passed, facets will be created
    # lagValueMetric: the metric to be used for checking lags (typically 'vpm7')
    # lagDepName: dependent variable (records in column 'name') to be used for the lagging process
    # namesSec: names to be plotted on a secondary y-axes
    # scaleSec: scale to be used for the secondary axis 
    #           namesSec/scaleSec should be similar in magnitude to namesPlot
    # plotTitle: plot title to be used (NULL means none)
    # plotSub: plot subtitle to be used (NULL means none)
    # plotCaption: plot caption to be used (NULL means none)
    # primYLab: primary y label (NULL means use mapper)
    # secYLab: secondary y label (default is "Caution, different metric and scale")
    # facetFixed: boolean, if TRUE scales="fixed", if FALSE scales="free_y"
    #             only relevant if length(keyStates) > 1
    # mapper: mapping file for variable names to descriptive names
    # combStates: states that should be combined together for plotting (named vector, c("state"="newName"))
    # popData: a population file for combining states
    # yValPerCap: boolean, is the y-value of type per-capita?
    # printPlot: boolean, whether to print the plots
    # ...: other arguments to be passed to alignCurves()

    # Create a frame to be used by the lagging process
    tempMetrics <- plotStateMetric(df, 
                                   yVal=yVal, 
                                   namesPlot=namesPlot,
                                   keyStates=keyStates,
                                   namesSec=namesSec, 
                                   scaleSec=scaleSec,
                                   plotTitle=plotTitle, 
                                   plotSub=plotSub, 
                                   plotCaption=plotCaption,
                                   primYLab=primYLab,
                                   secYLab=secYLab,
                                   facetFixed=facetFixed,
                                   mapper=mapper,
                                   combStates=combStates,
                                   popData=popData,
                                   yValPerCap=yValPerCap,
                                   printPlot=printPlot,
                                   returnData=TRUE  # the data must be returned for the next function
                                   )

    # Run the lagging process    
    tempLM <- alignCurves(tempMetrics, valueMetric=lagValueMetric, depName=lagDepName, ...)
    
    # Return the key values
    list(dfList=tempMetrics, lmList=tempLM)
    
}


# Example for northeastern states
neCurveList <- createAndAlignCurves(stateData, 
                                    yVal="vpm7", 
                                    namesPlot=c("CTP_cases"),
                                    namesSec=c("CTP_deaths"), 
                                    keyStates=c("NY", "NJ", "MA", "CT", "RI", "NH", "VT", "ME", "DE", "DC"),
                                    combStates=c("MA"="S NE", "CT"="S NE", "RI"="S NE", 
                                                 "NH"="N NE", "VT"="N NE", "ME"="N NE", 
                                                 "NY"="NY/NJ", "NJ"="NY/NJ", 
                                                 "DE"="DE/DC", "DC"="DE/DC"
                                                 ),
                                    plotTitle="2020 coronavirus burden per million per day (select states)", 
                                    plotSub="Cases on main y-axis, deaths on secondary y-axis", 
                                    primYLab="Cases per million (7-day rolling mean)",
                                    secYLab="Deaths per million (7-day rolling mean)",
                                    facetFixed=TRUE, 
                                    popData=usafPrepped$usafDemo, 
                                    printPlot=TRUE,
                                    lagValueMetric="vpm7", 
                                    lagDepName="CTP_deaths", 
                                    lagsTry=0:30
                                    )
## `summarise()` regrouping output by 'state', 'date' (override with `.groups` argument)
## 
## Will scale by: 0.07975869

## Warning: Removed 3 row(s) containing missing values (geom_path).

# Example for midwestern states
mwCurveList <- createAndAlignCurves(stateData, 
                                    yVal="vpm7", 
                                    namesPlot=c("CTP_cases"),
                                    namesSec=c("CTP_deaths"), 
                                    keyStates=c("OH", "MI", "IN", "IL"),
                                    plotTitle="2020 coronavirus burden per million per day (select states)", 
                                    plotSub="Cases on main y-axis, deaths on secondary y-axis", 
                                    primYLab="Cases per million (7-day rolling mean)",
                                    secYLab="Deaths per million (7-day rolling mean)",
                                    facetFixed=TRUE, 
                                    popData=usafPrepped$usafDemo, 
                                    printPlot=TRUE,
                                    lagValueMetric="vpm7", 
                                    lagDepName="CTP_deaths", 
                                    lagsTry=0:30
                                    )
## 
## Will scale by: 0.02321167

## Warning: Removed 3 row(s) containing missing values (geom_path).

The midwest is challenging to align. If using a single value for lag and a single value for CFR (case fatality rate), then predictions will have far too few deaths in the early months and far too many deaths in the later months.

The changes in CFR over time can also be estimated:

# Updated for automatic lag time assessment
assessStateCFR <- function(lst, 
                           keyStates, 
                           depVar, 
                           indepVar,
                           depTitleName, 
                           indepTitleName,
                           keyMetric="vpm7", 
                           lagEarlyDate=as.Date("2020-03-31"), 
                           lagMidDate=NULL,
                           lagLateDate=as.Date("2020-10-15"), 
                           lagEarlyValue=10, 
                           lagLateValue=20, 
                           lagsTry=0:30
                           ) {
    
    # FUNCTION ARGUMENTS:
    # lst: A list such as produced by createAndAlignCurves()
    # keyStates: The key states to be extracted from the list
    # depVar: the dependent variable
    # indepVar: the independent variable
    # depTitleName: the name for the dependent variable in the title
    # indepTitleName: the name for the independent variable in the plot title
    # keyMetric: the name of the key metric that is being assessed
    # lagEarlyDate: the date for the earliest lagging calculation (dates before this will be at lagEarlyValue)
    # lagMidDate: if lags are found from data, what midpoint should be used to split data as early vs late?
    #             NULL means midway between lagEarlyDate and lagLateDate
    # lagLateDate: the date for the latest lagging calculation (dates after this will be at lagLateValue)
    # lagEarlyValue: the value for lag on lagEarlyDate, will be linearly interpolated to lagLateValue/Date
    #                NULL means calculate from data and may differ by state
    # lagLateValue: the value for lag on lagLateDate, will be linearly interpolated from lagEarlyValue/Date
    #               NULL means estimate from data and may differ by state
    # lagsTry: the values for lag to be attempted if lageEarlyValue and/or lagLateValue is NULL
    
    # Extract the data for keyStates
    df <- lst[["dfList"]] %>%
        filter(state %in% keyStates, !is.na(get(keyMetric))) %>%
        pivot_wider(names_from="name", values_from=keyMetric)

    # Function for finding lag time correlations
    helperLagCor <- function(lt, lf, dp, id) {
        lf %>%
            group_by(state) %>%
            mutate(y=get(dp), x=lag(get(id), lt)) %>%
            summarize(p=cor(x, y, use="complete.obs")) %>%
            ungroup() %>%
            mutate(lag=lt)
    }
    
    # Middle date for splitting data
    if (is.null(lagMidDate)) lagMidDate <- mean(c(lagEarlyDate, lagLateDate))
    
    # Get the early lags from the data
    eLag <- map_dfr(.x=lagsTry, .f=helperLagCor, lf=filter(df, date<=lagMidDate), dp=depVar, id=indepVar) %>%
        group_by(state) %>%
        filter(p==max(p)) %>%
        filter(row_number()==1) %>%
        ungroup() %>%
        select(state, earlyLag=lag)
    
    # Get the late lags from the data
    lLag <- map_dfr(.x=lagsTry, .f=helperLagCor, lf=filter(df, date>lagMidDate), dp=depVar, id=indepVar) %>%
        group_by(state) %>%
        filter(p==max(p)) %>%
        filter(row_number()==1) %>%
        ungroup() %>%
        select(state, lateLag=lag)
    
    # Create the full lag frame, including substituting the fixed value(s) if provided
    lagFrame <- eLag %>%
        inner_join(lLag, by="state")
    if (!is.null(lagEarlyValue)) lagFrame <- lagFrame %>% mutate(earlyLag=lagEarlyValue)
    if (!is.null(lagLateValue)) lagFrame <- lagFrame %>% mutate(lateLag=lagLateValue)
    print(lagFrame)
    
    # Apply the assumed lagging information
    fullTime <- as.integer(lagLateDate-lagEarlyDate)
    df <- df %>%
        left_join(lagFrame, by="state") %>%
        arrange(state, date) %>%
        group_by(state) %>%
        mutate(eLag=lag(get(indepVar), mean(earlyLag)), 
               lLag=lag(get(indepVar), mean(lateLag)), 
               pctEarly=pmin(pmax(as.integer(lagLateDate-date)/fullTime, 0), 1), 
               x=ifelse(is.na(eLag), NA, pctEarly*eLag + (1-pctEarly)*ifelse(is.na(lLag), 0, lLag)), 
               y=get(depVar),
               mon=factor(month.abb[lubridate::month(date)], levels=month.abb)
               ) %>%
        filter(!is.na(x)) %>%
        ungroup()

    # Regression for data from keyStates
    if (length(keyStates) > 1) stateLM <- lm(y ~ x:mon:state + 0, data=df) 
    else stateLM <- lm(y ~ x:mon + 0, data=df)
    
    # Add the predicted value to df
    df <- df %>%
        mutate(pred=predict(stateLM))

    # Plot of curve overlaps
    p1 <- df %>%
        select(state, date, y, pred) %>%
        pivot_longer(-c(state, date)) %>%
        ggplot(aes(x=date, y=value)) + 
        geom_line(aes(color=c("pred"="Predicted", "y"="Actual")[name], group=name)) + 
        scale_x_date(date_breaks="1 month", date_labels="%b") + 
        labs(x="", 
             y=stringr::str_to_title(depTitleName), 
             title=paste0("Predicted vs. actual ", depTitleName)
             ) +
        scale_color_discrete("Metric") +
        facet_wrap(~state)
    print(p1)
    
    # Plot of rate by month
    p2 <- coef(stateLM) %>%
        as.data.frame() %>%
        purrr::set_names("CFR") %>%
        tibble::rownames_to_column("monState") %>%
        mutate(mon=factor(stringr::str_replace_all(monState, pattern="x:mon|:state.+", replacement=""), 
                          levels=month.abb
                          ), 
               state=if (length(keyStates)==1) keyStates 
                     else stringr::str_replace_all(monState, pattern="x:mon[A-Za-z]{3}:state", replacement="")
               ) %>%
        left_join(lagFrame, by="state") %>%
        ggplot(aes(x=mon, y=CFR)) + 
        geom_col(fill="lightblue") + 
        geom_text(aes(y=CFR/2, label=paste0(round(100*CFR, 1), "%"))) +
        geom_text(data=~filter(., mon==month.abb[lubridate::month(lagMidDate)]), 
                  aes(x=-Inf, y=Inf, label=paste0("Early Lag: ", earlyLag)), 
                  hjust=0, 
                  vjust=1
                  ) + 
        geom_text(data=~filter(., mon==month.abb[lubridate::month(lagMidDate)]), 
                  aes(x=Inf, y=Inf, label=paste0("Late Lag: ", lateLag)), 
                  hjust=1, 
                  vjust=1
                  ) + 
        labs(x="", 
             y=paste0(stringr::str_to_title(depTitleName), " as percentage of lagged ", indepTitleName), 
             title=paste0(stringr::str_to_title(depTitleName), 
                          " vs. lagged ", 
                          indepTitleName, 
                          " in state(s): ", 
                          paste0(keyStates, collapse=", ")
                          ), 
             subtitle=paste0("Assumed early lag on ", 
                             lagEarlyDate,
                             " interpolated to late lag on ", 
                             lagLateDate
                             ), 
             caption="Linear model coefficients on lagged data with no intercept used to estimate percentage"
             ) + 
        facet_wrap(~state)
    print(p2)

    # Return the data frame
    df
    
}

# Deaths vs. cases in Michigan
mwOut <- assessStateCFR(mwCurveList, 
                        keyStates=c("MI", "IL", "IN"), 
                        depVar="CTP_deaths", 
                        indepVar="CTP_cases", 
                        depTitleName="deaths", 
                        indepTitleName="cases", 
                        lagEarlyValue=NULL,
                        lagLateValue=NULL
                        )
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(keyMetric)` instead of `keyMetric` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 3
##   state earlyLag lateLag
##   <chr>    <int>   <int>
## 1 IN           2       2
## 2 IL           5       3
## 3 MI          10       0

# Deaths vs. cases in NY/NJ and S NE
neOut <- assessStateCFR(neCurveList, 
                        keyStates=c("NY/NJ", "S NE"), 
                        depVar="CTP_deaths", 
                        indepVar="CTP_cases", 
                        depTitleName="deaths", 
                        indepTitleName="cases", 
                        lagEarlyValue=NULL,
                        lagLateValue=NULL
                        )
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
##   state earlyLag lateLag
##   <chr>    <int>   <int>
## 1 NY/NJ        6      24
## 2 S NE         6       0

## Warning: Removed 4 rows containing missing values (position_stack).
## Warning: Removed 4 rows containing missing values (geom_text).

The CFR declines in more recent months, possibly as a function of a greater number of tests finding less serious disease. Lag times are variable but typically seem to be around a week. The use of both lag times and variable CFR by month introduces some risk of over-fitting.

The process for investigating lags and leads can also be refreshed with new data:

lagVectorWindows <- function(v1, 
                             v2, 
                             lagsTry=0:30, 
                             windowSize=30, 
                             minNoNA=ceiling(windowSize/2), 
                             updateStatus=FALSE, 
                             isLag=TRUE
                             ) {
    
    # FUNCTION ARGUMENTS:
    # v1: the first vector, which will be used 'as is'
    # v2: the second vector, which will be lagged/led by various values for lagsTry
    # lagsTry: the values for x that will be used in cor(v1, lag/lead(v2, x))
    # windowSize: the size of the window to use in taking snippets of v1 and lagged/led v2
    # minNoNA: minimum number of non-NA lagged/led values needed to calculate a correlation
    # updateStates: whether to print which window is being worked on
    # isLag: boolean, should a lag or a lead be applied (TRUE is lag, FALSE is lead)
    
    # Find the function to be used
    func <- if (isLag) lag else lead
    
    # Find the list of all possible window start points
    windowStarts <- 1:(length(v1)-windowSize+1)
    
    # Helper function to create a frame of correlations
    helperCorr <- function(s) {
        
        # Create the end point for the vector
        e <- s + windowSize - 1
        
        # Announce the start
        if (updateStatus) cat("\nProcessing window starting at:", s)
        
        # Create the correlations tibble for all values of lag, and return
        tibble::tibble(startWindow=s, endWindow=e, lags=lagsTry) %>%
            mutate(na1=sum(is.na(v1[s:e])), 
                   na2=sapply(lags, FUN=function(x) sum(is.na(func(v2, x)[s:e]))), 
                   p=sapply(lags, 
                            FUN=function(x) { 
                                ifelse(sum(!is.na(func(v2, x)[s:e])) < minNoNA,
                                       NA, 
                                       cor(v1[s:e], func(v2, x)[s:e], use="complete.obs")
                                       ) 
                                } 
                            )
                   )
    }

    # Bind the correlations frames and return
    map_dfr(windowStarts, .f=helperCorr)
    
}



# Function to assess correlations by lag/lead and window by state
stateCorr <- function(lst, 
                      keyState, 
                      met="vpm7", 
                      v1Name="CTP_deaths", 
                      v2Name="CTP_cases", 
                      windowSize=42, 
                      isLag=TRUE
                      ) {
    
    # FUNCTION ARGUMENTS:
    # lst: the processed list
    # keyState: the state of interest
    # met: the metric of interest
    # v1Name: the name of the first vector (this is considered fixed)
    # v2Name: the name of the second vector (this will have the lead/lag applied to it)
    # windowSize: number of days in the window
    # isLag: boolean, whether to use lag (TRUE) or lead (FALSE) on v2Name
    
    # Extract the data for the key State
    df <- lst[["dfList"]] %>%
        filter(state %in% keyState, !is.na(get(met))) %>%
        arrange(state, date, name)
    
    # Get the minimum date that is common to both
    minDate <- df %>%
        group_by(name) %>%
        summarize(date=min(date)) %>%
        pull(date) %>%
        max()
    
    # Extract v1 and v2
    v1 <- df %>% 
        filter(name==v1Name, date>=minDate) %>% 
        pull(met)
    v2 <- df %>% 
        filter(name==v2Name, date>=minDate) %>% 
        pull(met)

    # Confirm that dates are the same for both vectors
    dfDates1 <- df %>% filter(name==v1Name, date>=minDate) %>% pull(date)
    dfDates2 <- df %>% filter(name==v2Name, date>=minDate) %>% pull(date)
    if (!all.equal(dfDates1, dfDates2)) stop("\nDate mismatch\n")

    # Find the lags in the data
    dfLags <- lagVectorWindows(v1, v2, lagsTry=0:30, windowSize=windowSize, isLag=isLag) %>%
        mutate(windowStartDate=dfDates1[startWindow])

    # Give the description of the lag or lead
    descr <- ifelse(isLag, "lag (days)", "lead (days)")
    
    # Boxplot of correlations by lag
    p1 <- dfLags %>%
        filter(!is.na(p)) %>%
        ggplot(aes(x=factor(lags), y=p)) + 
        geom_boxplot(fill="lightblue") + 
        labs(x=stringr::str_to_title(descr), 
             y="Correlation", 
             title=paste0("Box plot of correlation by ", descr)
             )
    print(p1)

    # Plot of best lags by starting date
    p2 <- dfLags %>%
        filter(!is.na(p)) %>%
        group_by(startWindow) %>%
        filter(p==max(p)) %>%
        ggplot(aes(x=windowStartDate, y=lags)) + 
        geom_point(aes(size=p)) + 
        labs(x="Window start date", 
             y=paste0("Best ", descr), 
             title=paste0("Best ", descr, " by window starting date")
             ) + 
        scale_size_continuous(paste0("p at best ", stringr::str_replace(descr, " .*", ""))) + 
        scale_x_date(date_breaks="1 month", date_labels="%b")
    print(p2)
    
    # Plot of correlations by lag
    p3 <- dfLags %>%
        filter(!is.na(p)) %>%
        ggplot(aes(x=windowStartDate, y=lags)) + 
        geom_tile(aes(fill=p)) + 
        labs(x="Window start date", 
             y=stringr::str_to_title(descr), 
             title=paste0(stringr::str_to_title(descr), " by window starting date")
             ) + 
        scale_color_continuous(paste0("p at ", stringr::str_replace(descr, " .*", ""))) + 
        scale_x_date(date_breaks="1 month", date_labels="%b")
    print(p3)

    # Rename variable lags to leads if isLag is FALSE
    if (isFALSE(isLag)) dfLags <- dfLags %>%
        rename(leads=lags)
    
    # Return dfLags
    dfLags
    
}


miLeadData <- stateCorr(mwCurveList, keyState="MI", v1Name="CTP_cases", v2Name="CTP_deaths", isLag=FALSE)
## `summarise()` ungrouping output (override with `.groups` argument)

nynjLeadData <- stateCorr(neCurveList, keyState="NY/NJ", v1Name="CTP_cases", v2Name="CTP_deaths", isLag=FALSE)
## `summarise()` ungrouping output (override with `.groups` argument)

And the full process can be run for the state of Texas:

# Example for southern states
soCurveList <- createAndAlignCurves(stateData, 
                                    yVal="vpm7", 
                                    namesPlot=c("CTP_cases"),
                                    namesSec=c("CTP_deaths"), 
                                    keyStates=c("AZ", "FL", "GA", "TX"),
                                    plotTitle="2020 coronavirus burden per million per day (select states)", 
                                    plotSub="Cases on main y-axis, deaths on secondary y-axis", 
                                    primYLab="Cases per million (7-day rolling mean)",
                                    secYLab="Deaths per million (7-day rolling mean)",
                                    facetFixed=TRUE, 
                                    popData=usafPrepped$usafDemo, 
                                    printPlot=TRUE,
                                    lagValueMetric="vpm7", 
                                    lagDepName="CTP_deaths", 
                                    lagsTry=0:30
                                    )
## 
## Will scale by: 0.02108297

## Warning: Removed 3 row(s) containing missing values (geom_path).

assessStateCFR(soCurveList, 
               keyStates=c("AZ", "FL", "GA", "TX"), 
               depVar="CTP_deaths", 
               indepVar="CTP_cases", 
               depTitleName="deaths", 
               indepTitleName="cases"
               )
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 4 x 3
##   state earlyLag lateLag
##   <chr>    <dbl>   <dbl>
## 1 GA          10      20
## 2 TX          10      20
## 3 AZ          10      20
## 4 FL          10      20

## Warning: Removed 4 rows containing missing values (position_stack).
## Warning: Removed 4 rows containing missing values (geom_text).

## # A tibble: 967 x 13
##    state date       CTP_cases CTP_deaths earlyLag lateLag   eLag  lLag pctEarly
##    <chr> <date>         <dbl>      <dbl>    <dbl>   <dbl>  <dbl> <dbl>    <dbl>
##  1 AZ    2020-03-17      1.17     0            10      20 0.0837    NA        1
##  2 AZ    2020-03-18      1.92     0.0209       10      20 0.146     NA        1
##  3 AZ    2020-03-19      2.93     0.0418       10      20 0.146     NA        1
##  4 AZ    2020-03-20      5.17     0.0418       10      20 0.126     NA        1
##  5 AZ    2020-03-21      7.05     0.105        10      20 0.146     NA        1
##  6 AZ    2020-03-22      8.83     0.126        10      20 0.146     NA        1
##  7 AZ    2020-03-23     11.2      0.167        10      20 0.272     NA        1
##  8 AZ    2020-03-24     14.0      0.272        10      20 0.293     NA        1
##  9 AZ    2020-03-25     16.1      0.293        10      20 0.398     NA        1
## 10 AZ    2020-03-26     16.0      0.314        10      20 0.732     NA        1
## # ... with 957 more rows, and 4 more variables: x <dbl>, y <dbl>, mon <fct>,
## #   pred <dbl>
txLeadData <- stateCorr(soCurveList, keyState="TX", v1Name="CTP_cases", v2Name="CTP_deaths", isLag=FALSE)
## `summarise()` ungrouping output (override with `.groups` argument)

The process for assessing integrated data is converted to a main function:

integrateStateData <- function(stateData=NULL, 
                               popData=NULL,
                               ctpList=NULL,
                               usafData=NULL,
                               cdcList=NULL, 
                               glimpseStateData=NULL, 
                               runAll=FALSE,
                               runTwoAxis=runAll, 
                               yVal="vpm7", 
                               var1="CTP_cases", 
                               var2="CTP_deaths", 
                               keyStates=sort(c(state.abb, "DC")), 
                               combStates=vector("character", 0), 
                               mapper=varMapper, 
                               runCreateAlign=runAll, 
                               lagsTry=0:30, 
                               runCFR=runAll, 
                               lagEarlyValue=10, 
                               lagLateValue=20, 
                               lagEarlyDate=as.Date("2020-03-31"), 
                               lagMidDate=NULL,
                               lagLateDate=as.Date("2020-10-15")
                               ) {
    
    # FUNCTION ARGUMENTS:
    # stateData: an integrated state-level data file (NULL means create from components)
    # popData: a state-level population data file (must be passed if stateData is not NULL)
    # ctpList: a processed list of COVID Tracking Project data (must be provided if stateData=NULL)
    # usafData: a processed tibble of USA Facts data (must be provided if stateData=NULL)
    # cdcList: a processed list of CDC All-Cause deaths data (must be provided if stateData=NULL)
    # glimpseStateData: boolean, whether to glimpse the stateData file (NULL means only if from components)
    # runAll: boolean, whether to set al of runTwoAxis, runCreateAlign, and runCFR all to the same value
    # runTwoAxis: whether to show a plot of two metrics on two axes
    # yVal: the y-value to use from stateData
    # var1: the first of the two variables of interest (should be the leading component if a drives b)
    # var2: the second of the two variables of interest (should be the lagging component if a drives b)
    # keyStates: subset of states to use for analysis
    # combStates: states that should be combined together for plotting (named vector, c("state"="newName"))
    # mapper: mapping file for variables to descriptive names
    # runCreateAlign: boolean, should createAndAlignCurves() be run?
    # lagsTry: lag values to try (applies to both createAlignCurves and assessStateCFR)
    # runCFR: boolean, should assessStateCFR be run?
    # lagEarlyValue: the value for lag on lagEarlyDate, will be linearly interpolated to lagLateValue/Date
    #                NULL means calculate from data and may differ by state
    # lagLateValue: the value for lag on lagLateDate, will be linearly interpolated from lagEarlyValue/Date
    #               NULL means estimate from data and may differ by state
    # lagEarlyDate: the date for the earliest lagging calculation (dates before this will be at lagEarlyValue)
    # lagMidDate: if lags are found from data, what midpoint should be used to split data as early vs late?
    #             NULL means midway between lagEarlyDate and lagLateDate
    # lagLateDate: the date for the latest lagging calculation (dates after this will be at lagLateValue)
    
    # Check that either stateData or its components have been provided
    if (!is.null(stateData)) {
        cat("\nA file has been passed for stateData, components will be ignored\n")
        if (is.null(popData)) stop("Must also pass a popData file of state-level population\n")
        if (is.null(glimpseStateData)) glimpseStateData <- FALSE
    } else {
        if (is.null(ctpList) | is.null(usafData) | is.null(cdcList)) {
            stop("\nMust provided all of ctpList, usafData, cdcList when stateData is NULL\n")
        }
        cat("\nBuilding stateData from the passed components\n")
        if (is.null(glimpseStateData)) glimpseStateData <- TRUE
        
        # Create COVID Tracking Project File
        ctpPrepped <- prepCTPData(ctpList)
        
        # Create USA Facts file
        usafPrepped <- prepUSAFData(usafData)
        
        # Create an integrated state population demographics file
        demoData <- ctpPrepped$ctpDemo %>%
            rename(popCTP=pop) %>%
            full_join(rename(usafPrepped$usafDemo, popUSAF=pop), by="state") %>%
            mutate(pop=pmax(popCTP, popUSAF))
        
        # Create CDC All-Cause File
        cdcPrepped <- prepCDCData(cdcList, popData=demoData)

        # Integrated state data
        stateData <- ctpPrepped$ctpData %>%
            bind_rows(usafPrepped$usafData) %>%
            bind_rows(cdcPrepped$cdcData)
        
        # Create popData if not provided
        if (is.null(popData)) popData <- demoData %>% select(state, pop)
        
    }
    
    # Show summaries of stateData if requested
    if (glimpseStateData) {
        # Glimpse the file
        glimpse(stateData)
        # Show control totals
        stateData %>%
            group_by(name) %>%
            summarize(value=sum(value, na.rm=TRUE), value7=sum(value7, na.rm=TRUE)) %>%
            print()
    }
    
    # Run plotStateMetric if requested
    if (runTwoAxis) {
        caseDeath <- plotStateMetric(stateData, 
                                     yVal=yVal, 
                                     namesPlot=var1,
                                     namesSec=var2, 
                                     keyStates=keyStates,
                                     combStates=combStates,
                                     plotTitle="2020 coronavirus burden per million per day (select states)", 
                                     plotSub="Caution that metrics are on different axes and scales",
                                     primYLab=paste0(mapper[var1], "\n", mapper[yVal]), 
                                     secYLab=paste0(mapper[var2], "\n", mapper[yVal]), 
                                     facetFixed=TRUE, 
                                     popData=popData,
                                     returnData=TRUE
                                     )
    } else {
        caseDeath <- NULL
    }
 
    # Run createAndAlignCurves() if requested
    if (runCreateAlign) {
        curveList <- createAndAlignCurves(stateData, 
                                          yVal=yVal, 
                                          namesPlot=var1,
                                          namesSec=var2, 
                                          keyStates=keyStates,
                                          combStates=combStates,
                                          plotTitle="2020 coronavirus burden per million per day (select states)", 
                                          plotSub="Caution that metrics are on different axes and scales",
                                          primYLab=paste0(mapper[var1], "\n", mapper[yVal]), 
                                          secYLab=paste0(mapper[var2], "\n", mapper[yVal]), 
                                          facetFixed=TRUE, 
                                          popData=popData,
                                          printPlot=TRUE,
                                          lagValueMetric=yVal, 
                                          lagDepName=var2, 
                                          lagsTry=lagsTry
                                          )
        
    } else {
        curveList <- NULL
    }
    
    # Run assessStateCFR() if requested (requires that curveList have been created)
    if (runCFR & is.null(curveList)) {
        cat("\nassessStateCFR requires runCreateAlign=TRUE; skipping and setting runCFR=FALSE\n")
        runCFR <- FALSE
    }
    if (runCFR) {
        # Create the list of key states based on keyStates and combStates
        useStates <- keyStates
        for (ctr in 1:length(useStates)) { 
            if (useStates[ctr] %in% names(combStates)) useStates[ctr] <- combStates[useStates[ctr]]
        }
        useStates <- unique(useStates)
        # Run assessStateCFR
        cfrList <- assessStateCFR(curveList, 
                                  keyStates=useStates, 
                                  depVar=var2, 
                                  indepVar=var1, 
                                  depTitleName=stringr::str_replace(var2, ".*_", ""), 
                                  indepTitleName=stringr::str_replace(var1, ".*_", ""), 
                                  keyMetric=yVal,
                                  lagEarlyDate=lagEarlyDate,
                                  lagMidDate=lagMidDate,
                                  lagLateDate=lagLateDate,
                                  lagEarlyValue=lagEarlyValue,
                                  lagLateValue=lagLateValue,
                                  lagsTry=lagsTry
                                  )
    } else {
        cfrList <- NULL
    }
    
    # Return a list of the key data
    list(stateData=stateData, popData=popData, caseDeath=caseDeath, curveList=curveList, cfrList=cfrList)
    
}


# Example for creating a full stateData file
fullStateList <- integrateStateData(ctpList=readFromRDS("test_old_201108"), 
                                    usafData=readFromRDS("cty_old_20201109")$clusterStateData, 
                                    cdcList=readFromRDS("cdcList_20201110")
                                    )
## 
## Building stateData from the passed components
## `summarise()` regrouping output by 'state' (override with `.groups` argument)
## `summarise()` regrouping output by 'state' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## Rows: 104,103
## Columns: 9
## $ state  <chr> "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", ...
## $ date   <date> 2020-03-06, 2020-03-07, 2020-03-08, 2020-03-09, 2020-03-10,...
## $ metric <chr> "cases", "cases", "cases", "cases", "cases", "cases", "cases...
## $ source <chr> "CTP", "CTP", "CTP", "CTP", "CTP", "CTP", "CTP", "CTP", "CTP...
## $ name   <chr> "CTP_cases", "CTP_cases", "CTP_cases", "CTP_cases", "CTP_cas...
## $ value  <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 3, 0, 6, 2, 8, 0, 14, 6,...
## $ value7 <dbl> NA, NA, NA, 0.0000000, 0.1428571, 0.1428571, 0.1428571, 0.14...
## $ vpm    <dbl> 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ...
## $ vpm7   <dbl> NA, NA, NA, 0.0000000, 0.1934601, 0.1934601, 0.1934601, 0.19...
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 8 x 3
##   name             value     value7
##   <chr>            <dbl>      <dbl>
## 1 CDC_deaths    2224832.   2173735.
## 2 CDC_excess     242466.    240886.
## 3 CTP_cases     9717214    9372412.
## 4 CTP_deaths     228262     224869.
## 5 CTP_hosp      9301372    9036822.
## 6 CTP_tests   154892254  150735326.
## 7 USAF_cases    9679664    9336904.
## 8 USAF_deaths    233286     230182.
fullStateList
## $stateData
## # A tibble: 104,103 x 9
##    state date       metric source name      value value7   vpm   vpm7
##    <chr> <date>     <chr>  <chr>  <chr>     <dbl>  <dbl> <dbl>  <dbl>
##  1 AK    2020-03-06 cases  CTP    CTP_cases     0 NA      0    NA    
##  2 AK    2020-03-07 cases  CTP    CTP_cases     0 NA      0    NA    
##  3 AK    2020-03-08 cases  CTP    CTP_cases     0 NA      0    NA    
##  4 AK    2020-03-09 cases  CTP    CTP_cases     0  0      0     0    
##  5 AK    2020-03-10 cases  CTP    CTP_cases     0  0.143  0     0.193
##  6 AK    2020-03-11 cases  CTP    CTP_cases     0  0.143  0     0.193
##  7 AK    2020-03-12 cases  CTP    CTP_cases     0  0.143  0     0.193
##  8 AK    2020-03-13 cases  CTP    CTP_cases     1  0.143  1.35  0.193
##  9 AK    2020-03-14 cases  CTP    CTP_cases     0  0.429  0     0.580
## 10 AK    2020-03-15 cases  CTP    CTP_cases     0  0.857  0     1.16 
## # ... with 104,093 more rows
## 
## $popData
## # A tibble: 51 x 2
##    state      pop
##    <chr>    <dbl>
##  1 AK      738432
##  2 AL     4858979
##  3 AR     2978204
##  4 AZ     6828065
##  5 CA    39144818
##  6 CO     5456574
##  7 CT     3590886
##  8 DC      672228
##  9 DE      945934
## 10 FL    20271272
## # ... with 41 more rows
## 
## $caseDeath
## NULL
## 
## $curveList
## NULL
## 
## $cfrList
## NULL
# Example for using the full stateData file
nenynjList <- integrateStateData(stateData=fullStateList$stateData, 
                                 popData=fullStateList$popData,
                                 runAll=TRUE,
                                 keyStates=c("NY", "NJ", "MA", "CT", "RI", "NH", "VT", "ME"),
                                 combStates=c("MA"="S NE", "CT"="S NE", "RI"="S NE", 
                                              "NH"="N NE", "VT"="N NE", "ME"="N NE"
                                              ), 
                                 lagEarlyValue=NULL,
                                 lagLateValue=NULL
                                 )
## 
## A file has been passed for stateData, components will be ignored
## `summarise()` regrouping output by 'state', 'date' (override with `.groups` argument)
## 
## Will scale by: 0.07901078
## `summarise()` regrouping output by 'state', 'date' (override with `.groups` argument)

## 
## Will scale by: 0.07901078

## Warning: Removed 3 row(s) containing missing values (geom_path).
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 4 x 3
##   state earlyLag lateLag
##   <chr>    <int>   <int>
## 1 N NE         3      27
## 2 NY           6      17
## 3 S NE         6       0
## 4 NJ           8      30

## Warning: Removed 8 rows containing missing values (position_stack).
## Warning: Removed 8 rows containing missing values (geom_text).

# Example for alignment between USA Facts and CTP
testList <- integrateStateData(stateData=fullStateList$stateData, 
                               popData=fullStateList$popData,
                               runAll=TRUE,
                               keyStates=c("NY", "NJ", "MA", "CT", "RI", "NH", "VT", "ME"),
                               combStates=c("MA"="S NE", "CT"="S NE", "RI"="S NE", 
                                            "NH"="N NE", "VT"="N NE", "ME"="N NE"
                                            ), 
                               var1="CTP_deaths", 
                               var2="USAF_deaths",
                               lagEarlyValue=NULL,
                               lagLateValue=NULL
                               )
## 
## A file has been passed for stateData, components will be ignored
## `summarise()` regrouping output by 'state', 'date' (override with `.groups` argument)
## 
## Will scale by: 1.281759
## `summarise()` regrouping output by 'state', 'date' (override with `.groups` argument)

## 
## Will scale by: 1.281759

## Warning: Removed 2 row(s) containing missing values (geom_path).
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 4 x 3
##   state earlyLag lateLag
##   <chr>    <int>   <int>
## 1 N NE         0       0
## 2 NJ           0       0
## 3 S NE         1       2
## 4 NY           2       4

## Warning: Removed 8 rows containing missing values (position_stack).

## Warning: Removed 8 rows containing missing values (geom_text).

The differences in data reporting between CTP and USA Facts stand out. Lags are generally zero, though the spikes mean that each source has a different cumulative number of deaths. The process can similarly be run for the Great Lakes states (excluding Pennsylvania and New York):

# Example for using the full stateData file
glakesList <- integrateStateData(stateData=fullStateList$stateData, 
                                 popData=fullStateList$popData,
                                 runAll=TRUE,
                                 keyStates=c("MN", "WI", "IL", "IN", "MI", "OH"),
                                 lagEarlyValue=NULL,
                                 lagLateValue=NULL
                                 )
## 
## A file has been passed for stateData, components will be ignored
## 
## Will scale by: 0.01614313

## 
## Will scale by: 0.01614313

## Warning: Removed 3 row(s) containing missing values (geom_path).
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 6 x 3
##   state earlyLag lateLag
##   <chr>    <int>   <int>
## 1 IN           2       2
## 2 MN           3      13
## 3 WI           3      30
## 4 IL           5       3
## 5 MI          10       0
## 6 OH          11      19

Creating separate lags by state as well as separate CFR by month is important in the Great Lakes. Michigan had a very large spike early when there was little case-death lag and a high CFR. By contrast, Wisconsin had a large late spike when there was more meaningful case-death lag and a much lower CFR. Indiana especially needs the variation by month as its early spike had ~6% CFR and its late spike had ~1% CFR.